The existence of metallic implants in projection images for cone-beam computed tomography (CBCT) introduces undesired artifacts which degrade the quality of reconstructed images. In order to reduce metal artifacts, projection inpainting is an essential step in many metal artifact reduction algorithms. In this work, a hybrid network combining the shift window (Swin) vision transformer (ViT) and a convolutional neural network is proposed as a baseline network for the inpainting task. To incorporate metal information for the Swin ViT-based encoder, metal-conscious self-embedding and neighborhood-embedding methods are investigated. Both methods have improved the performance of the baseline network. Furthermore, by choosing appropriate window size, the model with neighborhood-embedding could achieve the lowest mean absolute error of 0.079 in metal regions and the highest peak signal-to-noise ratio of 42.346 in CBCT projections. At the end, the efficiency of metal-conscious embedding on both simulated and real cadaver CBCT data has been demonstrated, where the inpainting capability of the baseline network has been enhanced.
translated by 谷歌翻译
创伤干预的阳性结果取决于插入的金属植入物的术中评价。由于金属伪影,该评估的质量大大取决于所谓的金属伪影减少方法(MAR)的性能。这些MAR方法中的大多数需要先前的插入金属物体分割。因此,尽管存在一些主要缺点,但是,施加在重建的3D体积中的基于基于阈值的分割方法的通常。利用本出版物,研究了将分割任务转移到基于学习的基于学习的视图 - 一致的2D投影的方法的可能性。为了分割本金属,研究了使用在CADaVer研究期间获得的真实数据进行培训的基于基于学习的2D投影明智的分割网络。为了克服与2D投影明智分割的缺点,提出了一种一致性滤波器。通过使用新的分段掩码将标准FSMAR的结果与修改后的FSMAR版本进行比较,研究了移位分割域的影响。对真实尸体数据进行定量和定性评估,调查方法显示了MAR性能增加和对金属伪影的不敏感性。对于重建外部的金属外部的金属或消失金属外壳的情况,可以显示伪影的显着降低。因此,增加到大约3 dB w.r.t.实现了所有切片的平均PSNR度量,单切片最多9 dB。所示结果揭示了转变对基于2D的分段方法的有益影响,以便使用MAS方法的下游使用的真实数据。
translated by 谷歌翻译
基准标记通常用于导航辅助微创脊柱手术(Miss),他们帮助将图像坐标转移到现实世界坐标中。在实践中,这些标记可能位于视野(FOV)之外,由于术中手术中使用的C形臂锥形束计算机断层扫描(CBCT)系统的有限检测器尺寸。因此,CBCT体积中的重建标记遭受伪影并且具有扭曲的形状,其设定了导航的障碍。在这项工作中,我们提出了两个基准标记检测方法:直接检测从失真标记(直接方法)和标记恢复后检测(恢复方法)。为了直接检测重构体积中的失真标记,提出了一种使用两个神经网络和传统圆检测算法的有效的自动标记检测方法。对于标记恢复,提出了一种特定于任务的学习策略,以从严重截断的数据中恢复标记。之后,施加传统的标记检测算法用于位置检测。在模拟数据和实际数据上评估这两种方法,两者都可以实现小于0.2mm的标记配准误差。我们的实验表明,直接方法能够准确地检测扭曲的标记,并且具有任务特定学习的恢复方法对各种数据集具有高的鲁棒性和概括性。此外,特定于任务的学习能够准确地重建其他感兴趣的结构结构,例如,用于图像引导针活检的肋骨,来自严重截断的数据,从而使CBCT系统具有新的潜在应用。
translated by 谷歌翻译
Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
translated by 谷歌翻译
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories.
translated by 谷歌翻译
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.
translated by 谷歌翻译
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
translated by 谷歌翻译
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
translated by 谷歌翻译
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled, and to use the underlying latent vectors as input to the perception head. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information, that can be used to boost an actual perception task. This principle has a very simple formulation, which makes it both easy to implement and widely applicable to a large range of 3D sensors and deep networks performing semantic segmentation or object detection. In fact, it supports a single-stream pipeline, as opposed to most contrastive learning approaches, allowing training on limited resources. We conducted extensive experiments on various autonomous driving datasets, involving very different kinds of lidars, for both semantic segmentation and object detection. The results show the effectiveness of our method to learn useful representations without any annotation, compared to existing approaches. Code is available at \href{https://github.com/valeoai/ALSO}{github.com/valeoai/ALSO}
translated by 谷歌翻译
Natural language explanations promise to offer intuitively understandable explanations of a neural network's decision process in complex vision-language tasks, as pursued in recent VL-NLE models. While current models offer impressive performance on task accuracy and explanation plausibility, they suffer from a range of issues: Some models feature a modular design where the explanation generation module is poorly integrated with a separate module for task-answer prediction, employ backbone models trained on limited sets of tasks, or incorporate ad hoc solutions to increase performance on single datasets. We propose to evade these limitations by applying recent advances in large-scale multi-task pretraining of generative Transformer models to the problem of VL-NLE tasks. Our approach outperforms recent models by a large margin, with human annotators preferring the generated explanations over the ground truth in two out of three evaluated datasets. As a novel challenge in VL-NLE research, we propose the problem of multi-task VL-NLE and show that jointly training on multiple tasks can increase the explanation quality. We discuss the ethical implications of high-quality NLE generation and other issues in recent VL-NLE research.
translated by 谷歌翻译